Aggregation-Based Structured Text Retrieval
نویسنده
چکیده
DEFINITION Text retrieval is concerned with the retrieval of documents in response to user queries. This is achieved by (i) representing documents and queries with indexing features that provide a characterisation of their information content, and (ii) defining a function that uses these representations to perform retrieval. Structured text retrieval introduces a finer-grained retrieval paradigm that supports the representation and subsequent retrieval of the individual document components defined by the document’s logical structure. Aggregation-based structured text retrieval defines (i) the representation of each document component as the aggregation of the representation of its own information content and the representations of information content of its structurally related components, and (ii) retrieval of document components based on these (aggregated) representations. The aim of aggregation-based approaches is to improve retrieval effectiveness by capturing and exploiting the interrelations among the components of semi-structured text documents. The representation of each component’s own information content is generated at indexing time. The recursive aggregation of these representations, which takes place at the level of their indexing features, leads to the generation, either at indexing or at query time, of the representations of those components that are structurally related with other components. Aggregation can be defined in numerous ways; it is typically defined so that it enables retrieval to focus on those document components more specific to the query or to each document’s best entry points, i.e., document components that contain relevant information and from which users can browse to further relevant components.
منابع مشابه
Image retrieval using the combination of text-based and content-based algorithms
Image retrieval is an important research field which has received great attention in the last decades. In this paper, we present an approach for the image retrieval based on the combination of text-based and content-based features. For text-based features, keywords and for content-based features, color and texture features have been used. Query in this system contains some keywords and an input...
متن کاملIndexing Units
DEFINITION Indexing units refers to the granularity of information in the retrieval system’s index, which can be in principle any document part of a structured text, and as a consequence determines the possible units of retrieval. There are three basic approaches: The first approach is to index every potentially retrievable unit as a whole—the so-called element-based approach [13]. The second a...
متن کاملA Model for the Representation and Focussed Retrieval of Structured Documents Based on Fuzzy Aggregation
Effective retrieval of structured documents should exploit the content and structural knowledge associated with the documents. This knowledge can be used to focus retrieval to the best entry points: document components that contain relevant information, and from which users can browse to retrieve further relevant components. To enable this, suitable representation methods must be developed. Thi...
متن کاملTopX: efficient and versatile top-k query processing for text, structured, and semistructured data
TopX is a top-k retrieval engine for text and XML data. Unlike Boolean engines, it stops query processing as soon as it can safely determine the k top-ranked result objects according to a monotonous score aggregation function with respect to a multidimensional query. The main contributions of the thesis unfold into four main points, confirmed by previous publications at international conference...
متن کاملPresenting Structured Text Retrieval Results
DEFINITION Presenting structured text retrieval results refers to the fact that, in structured text retrieval, results are not independent and a judgment on their relevance needs to take their presentation into account. For example, HTML/XML/SGML documents contain a range of nested sub-trees that are fully contained in their ancestor elements. As a result, structured text retrieval should make ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009